Search CORE

9 research outputs found

Scoredist: A simple and robust protein sequence distance estimator

Author: Hollich Volker
Sonnhammer Erik LL
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Distance-based methods are popular for reconstructing evolutionary trees thanks to their speed and generality. A number of methods exist for estimating distances from sequence alignments, which often involves some sort of correction for multiple substitutions. The problem is to accurately estimate the number of true substitutions given an observed alignment. So far, the most accurate protein distance estimators have looked for the optimal matrix in a series of transition probability matrices, e.g. the Dayhoff series. The evolutionary distance between two aligned sequences is here estimated as the evolutionary distance of the optimal matrix. The optimal matrix can be found either by an iterative search for the Maximum Likelihood matrix, or by integration to find the Expected Distance. As a consequence, these methods are more complex to implement and computationally heavier than correction-based methods. Another problem is that the result may vary substantially depending on the evolutionary model used for the matrices. An ideal distance estimator should produce consistent and accurate distances independent of the evolutionary model used. RESULTS: We propose a correction-based protein sequence estimator called Scoredist. It uses a logarithmic correction of observed divergence based on the alignment score according to the BLOSUM62 score matrix. We evaluated Scoredist and a number of optimal matrix methods using three evolutionary models for both training and testing Dayhoff, Jones-Taylor-Thornton, and Müller-Vingron, as well as Whelan and Goldman solely for testing. Test alignments with known distances between 0.01 and 2 substitutions per position (1–200 PAM) were simulated using ROSE. Scoredist proved as accurate as the optimal matrix methods, yet substantially more robust. When trained on one model but tested on another one, Scoredist was nearly always more accurate. The Jukes-Cantor and Kimura correction methods were also tested, but were substantially less accurate. CONCLUSION: The Scoredist distance estimator is fast to implement and run, and combines robustness with accuracy. Scoredist has been incorporated into the Belvu alignment viewer, which is available at

Springer - Publisher Connector

PubMed Central

Pfam: clans, web tools and services

Author: Bateman Alex
Durbin Richard
Eddy Sean R
Finn Robert D
Griffiths-Jones Sam
Hollich Volker
Khanna Ajay
Lassmann Timo
Marshall Mhairi
Mistry Jaina
Moxon Simon
Schuster-Böckler Benjamin
Sonnhammer Erik L L
Publication venue: 'Oxford University Press (OUP)'
Publication date: 28/12/2005
Field of study

Pfam is a database of protein families that currently contains 7973 entries (release 18.0). A recent development in Pfam has enabled the grouping of related families into clans. Pfam clans are described in detail, together with the new associated web pages. Improvements to the range of Pfam web tools and the first set of Pfam web services that allow programmatic access to the database and associated tools are also presented. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://pfam.cgb.ki.se/)

CiteSeerX

Crossref

PubMed Central

Oxford University Research Archive

The University of Manchester - Institutional Repository

University of East Anglia digital repository

Orthology and protein domain architecture evolution

Author: Hollich Volker
Publication venue: Institutionen för cell- och molekylärbiologi (CMB) / Department of Cell and Molecular Biology
Publication date: 12/06/2006
Field of study

A major factor behind protein evolution is the ability of proteins to evolve new domain architectures that encode new functions. Protein domains are widely considered to constitute the "atoms" of protein chains, acting as building blocks of proteins as well as evolutionary units. A small number of domains are found in many different domain combinations, while the majority of domains co-occur with very few types of other domains. Domain architectures are not necessarily created once only during evolution. Cases of convergent evolution show how a favourable domain architecture has evolved multiple times independently. A basic concept for understanding evolution on gene level is orthology. Two genes are orthologous if they have evolved from the same gene in the last common ancestor of the species and have thus been created by a speciation event. Paralogous genes result from a duplication event that produced two gene copies within the same species.The concept of orthology can be transferred from genes to protein domains and utilised to explain recombination of protein domains and the evolution of domain architectures. The focus of this work is to augment the understanding of domain architecture evolution and its functional implications. We have examined, evaluated and improved existing methods as well as developed new approaches. The concept of orthology plays a major role in this work. Orthology is often inferred from phylogenetic trees that are based on pairwise distance estimations of protein sequences. The Scoredist protein sequence dis- tance estimator has been developed as one part of this thesis. It combines robustness with low computational complexity and can be calibrated towards various evolutionary models. Accurate phylogenetic trees are crucial for many applications, hence the appropriate tree reconstruction algorithm should be chosen with care. The strengths and weaknesses of many current tree reconstruction algorithms were assessed, and findings underscore the value of the Scoredist estimator. The Pfam protein families database comprises a large number of protein families and domains. As part of this thesis it has been enhanced by search and query tools, such as PfamAlyzer or the browser-based domain query, that can be applied on whole domain architectures instead of individual domains only.We have developed a Maximum Parsimony algorithm for the prediction of ancestral domain architectures. In contrast to previous approaches, it employs gene trees rather than species trees. The algorithm was a starting point for an extensive study of the do- main architectures present in Pfam for 50 completely sequenced species. Sampling widely across the kingdoms of life, the study sought to find and analyse cases where a domain architecture had been created multiple times. The algorithm proved robust to potential biases from horizontal gene transfer. Convergent evolution of domain architectures was found more frequently than by previous approaches. No strong biases driving convergent evolution were found. It therefore seems to be a random process in much the same way evolution through duplication and recombination, yet less frequent

Publications from Karolinska Institutet

Domain tree-based analysis of protein architecture evolution

Author: Forslund Kristoffer
Henricson Anna
Hollich Volker
Sonnhammer Erik L. L
Publication venue
Publication date: 10/05/2022
Field of study

Thư viện trường Đại học Đà Lạt

† These authors contributed equally to this work. RESEARCH ARTICLE Contact:

Author: Anna Henricson
Erik L. L. Sonnhammer
Erik Sonnhammer
Kristoffer Forslund
Volker Hollich
Publication venue
Publication date
Field of study

protein domain architecture evolution MBE Advance Access published November 19, 2007 Running head: Analysis of protein architecture evolutio

CiteSeerX

The Pfam protein families database

Author: Bateman Alex
Coin Lachlan
Durbin Richard
Eddy Sean R.
Finn Robert D.
Griffiths-Jones Sam
Hollich Volker
Khanna Ajay
Marshall Mhairi
Moxon Simon
Sonnhammer Erik L. L.
Studholme David J.
Yeats Corin
Publication venue: Oxford University Press
Publication date: 01/01/2004
Field of study

Pfam is a large collection of protein families and domains. Over the past 2 years the number of families in Pfam has doubled and now stands at 6190 (version 10.0). Methodology improvements for searching the Pfam collection locally as well as via the web are described. Other recent innovations include modelling of discontinuous domains allowing Pfam domain definitions to be closer to those found in structure databases. Pfam is available on the web in the UK (http://www.sanger.ac.uk/Software/Pfam/), the USA (http://pfam.wustl.edu/), France (http://pfam.jouy.inra.fr/) and Sweden (http://Pfam.cgb.ki.se/)

The University of Manchester - Institutional Repository

University of East Anglia digital repository

University of Queensland eSpace